Augmenting Approximate Similarity Searching with Lexical Information
نویسندگان
چکیده
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naı̈ve nearest-neighbour approach to compare context vectors extracted from large corpora scales poorly. The Spatial Approximation Sample Hierarchy (SASH) is a data-structure for performing approximate nearest-neighbour queries, and has been previously used to improve the scalability of distributional similarity searches. We add lexical semantic information from WordNet to the SASH in an attempt to improve the accuracy and efficiency of similarity searches.
منابع مشابه
Augmenting WordNet-like lexical resources with distributional evidence. An application-oriented perspective
The paper deals with the issue of how and to what extent WordNet-like resources provide the necessary information for an assessment of semantic similarity which is useful for practical applications. The general point is made that taxonomical information should be complemented with distributional evidence. The claim is substantiated through experimental data and an illustration of a word sense d...
متن کاملWord2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?
This paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric: METEOR. This metric enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments...
متن کاملDeveloping a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity
Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...
متن کاملTurbo similarity searching: Effect of fingerprint and dataset on virtual-screening performance
Turbo similarity searching uses information about the nearest neighbours in a conventional chemical similarity search to increase the effectiveness of virtual screening, with a data fusion approach being used to combine the nearest-neighbour information. A previous paper suggested that the approach was highly effective in operation; this paper further tests the approach using a range of differe...
متن کامل